{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Transformation\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "source": [
    "In order to verify the robustness comprehensively, textflint offers 20 universal transformations and 60 task-specific transformations, covering 12 NLP tasks. The full list of `Transformation`s can be found in our [website](https://www.textflint.com) or [github](https://github.com/textflint/textflint)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "sys.path.append('/home/yjc/codes/textrobustness')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to use a built-in Transformation\n",
    "textflint offers multiple `Transformation`s for each task. Here, we give an example on how to use the `AddSum` `Transformation` on Sentiment Analysis (SA) task."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 1. Import the Sample for SA task and the transformation AddSum \n",
    "from textflint.input_layer.component.sample.sa_sample import SASample\n",
    "from textflint.generation_layer.transformation.SA.add_sum import AddSum\n",
    "\n",
    "# 2. Define the SA data sample.\n",
    "data = {'x': \"Brilliant and moving performances by Tom Courtenay and Peter Finch\",\n",
    "        'y': 'positive'}\n",
    "sa_sample = SASample(data)\n",
    "\n",
    "# 3. Define the parameter of AddSum, here we only add summary for each person.\n",
    "trans = AddSum(entity_type='person')\n",
    "\n",
    "# 4. Transform the sample\n",
    "trans_sample = trans.transform(sa_sample)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'x': 'Brilliant and moving performances by Tom Courtenay (Sir Thomas Daniel Courtenay (/ˈkɔːrtni/; born 25 February 1937) is an English actor of stage and screen. After studying at the Royal Academy of Dramatic Art, Courtenay achieved prominence in the 1960s with a series of acclaimed film roles, including The Loneliness of the Long Distance Runner (1962)\\u2060, for which he received the BAFTA Award for Most Promising Newcomer to Leading Film Roles\\u2060, and Doctor Zhivago (1965), for which he received an Academy Award nomination for Best Supporting Actor. Other notable film roles during this period include Billy Liar (1963), King and Country (1964), for which he was awarded the Volpi Cup for Best Actor at the Venice Film Festival, King Rat (1965), and The Night of the Generals .) and Peter Finch (Frederick George Peter Ingle Finch (28 September 1916 \\xa0 – 14 January 1977) was an English-Australian actor. He is best remembered for his role as crazed television anchorman Howard Beale in the 1976 film Network, which earned him a posthumous Academy Award for Best Actor, his fifth Best Actor award from the British Academy of Film and Television Arts, and a Best Actor award from the Golden Globes. )',\n",
       " 'y': 'positive',\n",
       " 'sample_id': None}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "trans_sample[0].dump()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that a corresponding description has been added after Tom Courtenay and Peter Finch. This transformation should not change the label `y`, so it is still `positive`. The `sample_id` is used when we transform a dataset, and can be ignored here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define your own Transformation\n",
    "### Bread word swap\n",
    "As an introduction to writing transformations for textflint, we’re going to try a very simple transformation: one that replaces random words of a text with the word ‘bread’. In textflint, there’s an abstract `WordSubstitute` class that handles the heavy lifting of breaking sentences into words and avoiding replacement of stopwords. We can extend `WordSubstitute` and implement a single method, `_get_candidates`, to indicate to replace each word with ‘bread’. 🍞"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from textflint.generation_layer.transformation import WordSubstitute\n",
    "\n",
    "class BreadWordSwap(WordSubstitute):\n",
    "    r\"\"\"\n",
    "    Word Swap by randomly swaping words with bread.\n",
    "\n",
    "    \"\"\"\n",
    "    def _get_candidates(self, word, pos=None, n=1):\n",
    "        r\"\"\"\n",
    "        Returns a list containing apple.\n",
    "\n",
    "        :param word: str, the word to replace\n",
    "        :param pos: str, the pos of the word to replace\n",
    "        :param n: the number of returned words\n",
    "        :return: a candidates list\n",
    "        \"\"\"\n",
    "        return ['bread']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Try the Freshly-baked `BreadWordSwap`\n",
    "Once we have created our own transformation method, we can generate a transformed sample though following commands:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "Can't instantiate abstract class BreadWordSwap with abstract methods skip_aug",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-5-99a246e2918b>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtrans\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mBreadWordSwap\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      2\u001b[0m \u001b[0mtrans_sample\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtrans\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtransform\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msa_sample\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0mtrans_sample\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdump\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mTypeError\u001b[0m: Can't instantiate abstract class BreadWordSwap with abstract methods skip_aug"
     ]
    }
   ],
   "source": [
    "trans = BreadWordSwap()\n",
    "trans_sample = trans.transform(sa_sample)\n",
    "trans_sample[0].dump()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that only one word `performances` is replaced by `bread`. We can control the number of replaced words by setting the parameters `trans_min`, `trans_max` or `trans_p`. For example, if you want to get a sample with at least three words are replaced by `bread`, you can set `trans_min` to 3 as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trans = BreadWordSwap(trans_min=3)\n",
    "trans_sample = trans.transform(sa_sample)\n",
    "trans_sample[0].dump()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "In this tutorial, we show how to use a built-in `Transformation` `AddSum` and define our own `Transformation` `BreadWordSwap`. You also learned how to control the number of transformed words. In fact, you may also find that replacing three words in `BreadWordSwap` makes the result sample meaningless and hard-to-read. Our next section will show you how to use `Validator` to filter out sample that are useless. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": [
     "\n"
    ]
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
